Skip to content

Advanced Analysis

Author: Ruifeng Gao
Time: 27 min
Words: 5.3k words
Updated: 2026-01-26
Reads: 0 times
SeekSoul™ Online

Overview

This section summarizes common advanced single-cell transcriptome analysis methods, aiming to help researchers extract richer biologically interpretable conclusions. We focus on practicality, highlighting the definition, applicable scenarios, platform parameters, and key points for result interpretation for each method, and providing application examples, common pitfalls, and optimization suggestions to facilitate data analysis on the cloud platform.

TIP

How to Read This Section

  • Since this section is long, we recommend using the table of contents on the right to quickly jump to topics of interest and expand specific content as needed.

Immune Repertoire Analysis

Definition

Immune Repertoire (IR) refers to the collection of all T-cell receptors (TCRs) and B-cell receptors (BCRs) in an individual's adaptive immune system at a specific time point. Its diversity stems from V(D)J gene rearrangement during lymphocyte development. As an important component of single-cell research, immune repertoire analysis comprehensively assesses the status and dynamics of the immune system through high-throughput sequencing of TCRs and BCRs. By resolving key CDR3 sequences, this analysis can identify and quantify massive T/B cell clones, thereby revealing the characteristics of adaptive immune responses in health or disease states (such as infection, tumors, autoimmune diseases, etc.).

View More (Expand/Collapse)

Significance

  • Reveal Immune Diversity: Assess the health status of the immune system and its potential ability to respond to antigens. Reduced diversity may indicate impaired immune function or clonal selection for specific antigens.
  • Track Clonal Dynamics: By comparing samples from different time points, tissues, or conditions, specific T/B cell clone expansion, contraction, or persistence can be monitored to understand the dynamic process of immune response.
  • Identify Disease-Associated Clones: In tumor immunotherapy, infectious diseases, and autoimmune disease research, identifying specific TCR/BCR clones associated with disease progression or treatment response provides clues for developing diagnostic markers and targeted therapies.
  • Understand Immune Response Mechanisms: Combining single-cell transcriptome data, clonotype information can be linked to the gene expression profile of host cells to deeply explore the functional status of different clones (such as effector, memory, exhaustion, etc.).
  • Evaluate Vaccine and Immunotherapy Effects: By analyzing changes in the immune repertoire before and after vaccination or treatment, the effectiveness of immune interventions can be assessed, and their mechanisms of action revealed.

Workflow and Interpretation](./SeekSoulOnline_guide.src/vdj.html)

Pseudotime Analysis

Definition

Pseudotime analysis is a computational method used to infer the relative progression of cells along a dynamic biological process (e.g., development, differentiation, disease progression, or response to stimuli) from single-cell transcriptomic data. By measuring transcriptomic similarity between cells, it maps discrete cell samples into a low-dimensional space and orders them according to their intrinsic continuity, assigning each cell a relative scale (pseudotime value) representing "progression extent." It is important to emphasize that pseudotime is not equivalent to real time but reflects the relative position of cell states along an assumed trajectory.

Typical analysis workflows include: selecting ordering genes that represent the process, learning or fitting a principal graph structure (such as a tree or graph model) in low-dimensional space, and projecting cells onto this principal graph to calculate pseudotime values and identify branch points and terminal states.

View More (Expand/Collapse)

Significance

  • Overcome Cell Asynchrony: Single-cell samples are usually composed of mixtures of cells at different stages of development or response. Pseudotime analysis can reorder these asynchronous cells according to their intrinsic process, revealing continuous biological changes.
  • Reconstruct Continuous Processes: Compared to discrete clustering, pseudotime can display continuous transition trajectories between cell states, helping to understand gradual changes from initial to terminal states.
  • Identify Transitional States and Rare Cells: It can detect intermediate state cells or rare transition cells between two stable states, which often carry key fate determination information.
  • Discover Dynamic Regulatory Genes and Modules: By analyzing gene expression changes along pseudotime, key regulatory factors upregulated or downregulated at different stages of the process and co-varying gene modules can be identified, providing candidate gene sets for subsequent mechanistic studies.
  • Reveal Branch Points and Fate Decisions: When the trajectory contains branch structures, pseudotime analysis can locate fate divergence points and compare gene expression differences between branches, thereby proposing verifiable hypotheses about cell fate choices.
  • Guide Experimental Design and Multi-Method Validation: Pseudotime results can guide validation experiments such as time-series sampling, intervention experiments, or lineage tracing; they can also be combined with other methods (such as CytoTRACE based on transcriptional diversity or scVelo based on RNA velocity) to enhance conclusion reliability.

TIP

Pseudotime analysis is particularly suitable for studying biological problems with continuous change processes, such as:

  • Stem cell differentiation and development.
  • Immune cell activation and exhaustion.
  • Disease (e.g., cancer) occurrence and evolution.
  • Cell response processes to drugs or environmental stimuli.

Methods and Selection Guide

ToolCore PrincipleProsConsRecommended Scenario
Monocle 2Pseudotime + DDRTree Algorithm. Constructs a minimum spanning tree of cells in low-dimensional space through dimensionality reduction and graph learning; a classic method for parsing complex branches.- Stable and Classic: Constructed branch trajectories are clear and validated by extensive literature.
- Mature Ecology: Supported by numerous tutorials and papers.
- Computational Performance: Slow for very large datasets (>100k cells), high memory consumption.
- Start Point Dependency: Requires user to specify trajectory start point.
Top Recommendation. When you want to clearly demonstrate how cells differentiate from one state to multiple different terminal states, Monocle 2 is a proven, reliable choice.
Monocle 3Pseudotime + UMAP Embedding. Learns graph structure directly on dimensionality reduction plots like UMAP to infer trajectories; a more simplified workflow and newer strategy.- Fast and Scalable: Friendly to massive datasets, capable of handling millions of cells.
- Good Compatibility: Good compatibility with mainstream ecosystems like Scanpy.
- Still Developing: Algorithm is iterating rapidly; parsing ability for complex branches is sometimes less stable and clear than Monocle 2.Efficient alternative when handling massive datasets or when trajectory structure is relatively simple (e.g., linear or single branch).
CytoTRACETranscriptional Diversity. Based on the core hypothesis: cells with higher differentiation potential express more genes; cells with higher differentiation degrees have more specialized gene expression patterns.- Completely Unbiased: No need to specify a start point; automatically predicts cell differentiation potential.
- Good at Root Finding: Very powerful in identifying the "root" (most primitive cell group) of the trajectory.
- No Trajectory Graph: It mainly provides a high-to-low differentiation potential ranking (a value) rather than a visual path graph.Strongly recommended when you are unsure which cell group is the start point or want to objectively verify if the start point selection for tools like Monocle is correct.
scVeloRNA Velocity. Infers the transcriptional dynamic direction of each cell in the "future" few hours by quantifying the abundance of pre-mRNA (unspliced) and mature-mRNA (spliced).- Predict "Future": Reveals instantaneous direction and rate of cell state transitions, providing real dynamic information.
- Reveal Cyclic Processes: Very effective for depicting cyclic processes like the cell cycle.
- High Data Requirements: Requires high-quality data capable of effectively capturing intron reads.
- Complex Interpretation: Velocity vector fields can be messy and require careful interpretation.
Best choice when you want to know the "direction" and "velocity" of cell state transitions, not just the "path". Also suitable for exploring dynamic equilibrium systems.

Summary:

  • Standard Workflow: Monocle 2 (depict path) + CytoTRACE (determine start point).
  • Explore Direction and Rate: If data quality permits, use scVelo for deeper dynamic analysis.
  • Massive Datasets: Prioritize Monocle 3.

Cell-Cell Communication Analysis

Definition

Cell-cell communication analysis infers signal exchange between cell populations via ligand-receptor (L-R) pairs and downstream signaling pathways from single-cell/spatial transcriptomic data. Its goal is not only to identify possible L-R pairs but also to quantify "sender/receiver" roles and pathway strengths at the pathway and system levels, providing candidate signaling molecules and mechanistic hypotheses for subsequent experimental validation.

TIP

Cell communication results are "predictive" rather than "causal" conclusions. Be sure to combine them with expression evidence, literature, and experimental validation (flow cytometry, in situ, or functional blocking experiments).

View More (Expand/Collapse)

Common Software and Differences

ToolCore PrincipleProsConsRecommended Scenario
CellChat (R)Based on CellChatDB, aggregates multisubunit and cofactor info, combines network analysis and pathway summary to quantify communication strength and cell roles.Rich visualization (circle/chord/sankey, etc.), pathway summary, and sender/receiver role analysis; suitable for system-level comparison.Relies on prior database coverage; high computational resource requirement for massive datasets; sensitive to low expression/rare cells.First choice for system-level pathway comparison, sender/receiver role identification, and pathway prioritization.
CellPhoneDB (Python/CLI)Uses subunit minimum expression principle to represent complex expression, employs cell label permutation test for significance.Sensitive to complexes and rigorous statistical method; mature database for human data.Official focus is HUMAN (other species need mapping); relatively limited visualization and pathway summary.First choice for strict LR significance screening and conservative result validation in human data.
NicheNet (R)Integrates ligand-receptor, signaling, and TF→target gene networks, uses network propagation to assess ligand regulatory potential on target genes.Can provide mechanistic prediction of ligand→target gene and ligand activity scoring, suitable for explaining differential expression in receptor cells.Does not directly provide sender/receiver strength metrics between groups; usually requires differential genes as input.First choice when the focus is on whether a ligand can explain differential genes and downstream regulatory mechanisms in receptor cells.

Note: Differences in database coverage, complex annotation, and statistical strategies between tools can significantly affect detection results; cross-tool validation is recommended when reporting candidates.

Recommendations

  • For system-level visualization and pathway comparison (including sender/receiver analysis): Prioritize CellChat (excellent visualization and pathway summary capabilities).
  • For human data with a focus on statistical significance of L-R pairs: CellPhoneDB is a robust choice (emphasizes subunit minimum expression strategy and permutation tests).
  • For explaining differential genes in receptor cells via ligands: Use NicheNet to generate candidate lists of ligand→target genes and activity scores.
  • Comprehensive Strategy (Recommended): After software analysis, validate using wet lab experiments or spatial data.

Common Pitfalls

WARNING

  • Do not rely solely on "strength values" from a single software to draw experimental conclusions; different methods measure different meanings.
  • Avoid directly comparing values between different groups without unified annotation and preprocessing; such comparisons are prone to technical bias.

References

  • CellChat Official Repository: https://github.com/sqjin/CellChat
  • CellPhoneDB Official Repository: https://github.com/ventolab/CellphoneDB
  • NicheNet Official Repository: https://github.com/saeyslab/nichenetr

Copy Number Variation (CNV) Analysis

Definition

Copy Number Variation (CNV) is a type of genomic structural variation referring to the increase or decrease in copy number of DNA segments larger than 1kb compared to a reference genome. CNV is a major source of genomic difference within and between species and is closely related to the development of many diseases, especially in tumor research, where CNV is an important feature of tumor heterogeneity.

In single-cell RNA sequencing (scRNA-seq) data, CNV can be indirectly inferred by analyzing gene expression changes across continuous genomic regions. If the overall gene expression level in a chromosomal region increases or decreases, it may imply an increase or decrease in copy number in that region.

View More (Expand/Collapse)

Significance

Single-cell CNV analysis is extremely important in tumor research:

  • Distinguish Tumor and Non-Tumor Cells: Tumor cells are usually accompanied by extensive genomic instability, manifesting as widespread CNVs. Analyzing CNV patterns of single cells can effectively distinguish malignant tumor cells from normal cells (such as immune cells, stromal cells) mixed in tumor tissues.
  • Reveal Tumor Clonal Heterogeneity: Tumor tissues consist of sub-clones with different genomic characteristics. Single-cell CNV analysis can identify tumor sub-clones with different CNV patterns, revealing the clonal structure and evolutionary relationship within the tumor.
  • Identify Key Driver Genes: CNV analysis can locate chromosomal regions with frequent copy number gains or losses, which may contain key oncogenes or tumor suppressor genes, providing clues for finding new therapeutic targets.
  • Assess Tumor Evolution and Drug Resistance: By comparing tumor cell CNV patterns at different treatment stages or metastatic sites, tumor evolutionary paths can be tracked, and genomic variations related to drug resistance can be studied.

InferCNV vs. CopyKAT

InferCNV and CopyKAT are two widely used bioinformatics tools for inferring CNV from scRNA-seq data. They differ in algorithm principle, applicable scenarios, and result focus.

ToolCore PrincipleProsConsRecommended Scenario
InferCNVInfers CNV by comparing gene expression profiles of tumor cells with a set of "normal" reference cells using a sliding window to smooth gene expression.- Classic algorithm, reliable results
- Focuses on large-scale chromosomal arm-level variations
- Must provide high-quality normal reference cells
- Relatively lower resolution
- Computationally intensive, time-consuming
- Results require user interpretation and cell grouping
Suitable for scenarios with clear, reliable normal cell references (e.g., adjacent normal tissue, immune cells) and a focus on large-scale variations.
CopyKATUses an integrated Bayesian method to identify CNVs by comparing with a mixture model of genomic position and expression.- No need for predefined reference cells, automatic identification
- Higher resolution (~5Mb)
- High computational efficiency, fast
- Automatically predicts malignant/normal status of cells
- Accuracy relies on sufficient normal cells in data as internal reference
- Automatic sub-clone division might be too simplified for complex intra-tumor heterogeneity
Suitable for scenarios without clear normal controls or when automatic differentiation of tumor/normal cells is desired. Its automation makes it preferred for exploratory analysis.

Selection Advice:

  • If your data contains clear, reliable normal cells as reference (e.g., non-epithelial cells from adjacent tissues, or clearly annotated immune cells), and you are more interested in large-scale chromosomal arm-level variations, InferCNV is a classic and reliable choice.
  • If your data lacks clear normal cell controls, or you want the algorithm to automatically distinguish tumor and normal cells, and you are interested in higher resolution CNV events, CopyKAT is a better choice. Its higher degree of automation makes results easier to interpret.
  • In actual analysis, both tools can be used simultaneously to cross-validate results for more reliable conclusions.

Regulatory Network Analysis

Definition

Regulatory network analysis infers gene regulatory relationships and identifies functional modules from single-cell transcriptomic data. By integrating gene expression correlation, transcription factor binding site information, and cell-type-specific expression patterns, it constructs cell-specific gene regulatory networks, revealing transcriptional regulatory mechanisms driving cell state transitions and functional differentiation.

Typical analysis workflows include: identifying co-regulated gene modules based on gene co-expression patterns (e.g., hdWGCNA), inferring regulatory relationships between transcription factors and target genes (e.g., SCENIC), and revealing the biological significance of modules through functional enrichment analysis.

View More (Expand/Collapse)

Significance

  • Reveal Transcriptional Regulatory Mechanisms: Identifying key transcription factors and their target genes reveals mechanisms behind cell state transitions.
  • Identify Functional Gene Modules: Gene co-expression network analysis identifies gene modules co-expressed in specific cell types, providing clues for understanding cell functions.
  • Discover Cell-Type-Specific Regulators: Analyzing regulator activity differences across cell types identifies key cell-type-specific regulators.
  • Understand Molecular Basis of Heterogeneity: Helps understand gene regulatory differences behind cell heterogeneity, providing molecular basis for cell classification and annotation.
  • Guide Target Screening: Identifying key regulatory modules and factors provides candidate gene sets for subsequent functional validation and drug target screening.
  • Integrate Multi-Omics: Can integrate gene expression, epigenetics, and protein interaction data for comprehensive regulatory mechanism parsing.

TIP

Regulatory network analysis is particularly suitable for studying biological problems with complex regulatory relationships, such as:

  • Molecular mechanisms of cell fate determination.
  • Regulatory network rewiring of disease-associated genes.
  • Drug mechanism and resistance research.
  • Dynamic changes in transcriptional regulation during development.

Methods and Selection Guide

ToolCore PrincipleProsConsRecommended Scenario
hdWGCNAGene Co-expression Network + Module Identification. Constructs weighted gene co-expression networks based on correlation, identifies functional modules via hierarchical clustering.- Modular Analysis: Identifies cell-type-specific functional modules
- Eigengene Extraction: Quantifies module activity via Module Eigengenes
- Functional Enrichment: Reveals module biology via GO/KEGG
- Indirect Regulation: Based on co-expression, cannot directly verify regulation
- Parameter Sensitive: Results affected by soft thresholding parameters
Top Recommendation. Mature and reliable choice for identifying co-expressed gene modules in cell types and understanding their functions.
SCENICNetwork Inference + Regulon Activity. Infers TF-target relationships from co-expression, verifies with motif analysis, calculates regulon activity via AUCell.- Direct Regulation: Infers direct TF-target relationships
- Activity Quantification: Quantifies regulon activity in each cell via AUC
- State Identification: Groups/annotates cells based on network activity
- Computationally Complex: Complex workflow, high resource demand
- High Data Quality: Requires high input data quality
Best choice for deeply understanding transcriptional regulatory mechanisms and identifying key TFs and their targets.

Gene Set Scoring Analysis

Definition

Gene set scoring evaluates the activity of predefined gene sets in single-cell transcriptomic data. By comprehensively scoring the activity for each cell or cell group, this method quantifies the enrichment of specific biological pathways, functions, or states. This scoring mechanism provides a powerful tool for revealing intrinsic cell heterogeneity and deeply exploring biological differences across cell states.

View More (Expand/Collapse)

Significance

  • Functional Annotation: Annotate unknown cell groups to reveal their biological roles.
  • State Comparison: Compare activity changes of specific pathways under different conditions or cell types.
  • Heterogeneity Exploration: Discover functional heterogeneity within the same cell group.
  • Biological Insight: Provide molecular explanations for disease mechanisms, differentiation, drug response, etc.

Methods and Selection Guide

ToolCore PrincipleProsConsRecommended Scenario
scMetabolismBased on VISION and AUCell, built-in 78 KEGG/REACTOME metabolic pathways.Optimized for metabolism, intuitive results.Limited to 78 pathways, not for other sets.Metabolic Pathway Focus. Recommended for quick study of metabolic reprogramming.
GSVANon-parametric, unsupervised method converting expression matrix to enrichment score matrix.Broad applicability, supports custom/public sets; good for cross-sample comparison.Interpretation depends on set quality; may be less sensitive to sparse data.General Gene Set Variation. Recommended for comprehensive, unbiased analysis using large DBs like MSigDB.
ScoringIntegrates AUCell, UCell, singscore, AddModuleScore.High flexibility, supports custom sets and cross-validation.Requires user understanding of algorithms.Flexible Tool. Recommended for custom sets or cross-validating with multiple algorithms.
Module ScoreAddModuleScoreUses Seurat's AddModuleScore function.- Simple, intuitive, adjustable plotting
- May be affected by set size/expression
Suitable for quick assessment and visualization of gene set activity.

Perturbation Analysis

Definition

Perturbation analysis in single-cell transcriptomics assesses transcriptional response differences and sensitivity of cell populations or types under defined external/internal conditions (e.g., disease, drug, knockout, timepoint). The goal is to identify cell types or gene modules most responsive to conditions, revealing potential biological regulation and functional differences.

View More (Expand/Collapse)

Significance

  • Identify Key Responsive Cell Types: Find cell types most sensitive to perturbation for functional validation.
  • Reveal Condition-Specific Mechanisms: Identify genes/pathways changed under perturbation to understand regulation.
  • Prioritize Resources: Prioritize cell subsets for in-depth validation in complex tissues.
  • Guide Intervention/Biomarkers: Provide basis for precision therapy and biomarker development.

Setup

  • Perturbation Factor: Corresponds to a metadata column (e.g., treatment, disease_status). Ensure clear, mutually exclusive labels (e.g., treated vs control).
  • Perturbation Objects: Target groups to compare (whole sample, specific cell type). Ensure sufficient cells (>100/group) and consider batch effects.

TIP

Workflow: Confirm factor/groups, then run analysis on biologically relevant cell types to get robust prioritization.

  • Tool: Use Augur for prioritization; combine with differential expression, regulatory network, cell communication for cross-validation.

ATAC Pseudotime Analysis

Definition

Continuity of chromatin accessibility patterns in scATAC-seq data allows identifying continuous paths in epigenetic space via dimensionality reduction and graph learning, constructing directed developmental trajectories, and calculating pseudotime from a start point. Single-cell multi-omics data contains scRNA-seq and scATAC-seq dimensions; accessibility dynamics often precede transcription, suggesting key regulatory events earlier than expression—focusing on the "cause" perspective.

View More (Expand/Collapse)

Significance

  • Reveal Differentiation Continuity: Reorder asynchronous cells to reveal continuous change trajectories.
  • Capture Epigenetic Dynamics: Identify open/closed states of key regulatory elements (peaks) during differentiation.
  • Understand Fate Determination: Locate fate divergence points and identify epigenetic regulators driving fate choice.

Methods and Selection Guide

ToolCore PrincipleProsConsRecommended Scenario
ATAC_Monocle3Pseudotime + UMAP Embedding. Based on Monocle3, uses LSI preprocessing for scATAC sparsity, learns graph on UMAP.- Fast, Scalable: Handles massive ATAC datasets
- Epigenetic Specific: Optimized for accessibility
- Compatible: Works with Seurat
- Clear Trajectory: Identifies complex branches
- Start Point Dependent: User specifies root
- High Data Quality: Needs high quality scATAC
- Developing: Algorithm iterating
Top Recommendation for scATAC. Reliable choice for reconstructing differentiation trajectories based on chromatin accessibility.

Selection Advice:

  • scATAC-seq Data: ATAC_Monocle3 is optimized and recommended.
  • Multi-omics Integration: Use Monocle 3 and ATAC_Monocle3 separately, then integrate pseudotime results for a comprehensive view.

ATAC Copy Number Variation (CNV) Analysis

Definition

In scATAC-seq data, CNV can be indirectly inferred by analyzing read counts in genomic regions. If read counts in a region increase/decrease, it may imply copy number gain/loss.

Directions:

  • Based on scRNA-seq: InferCNV, CopyKAT
  • Based on scATAC-seq: epiAneuFinder, AtaCNV, CopyscAT
View More (Expand/Collapse)

Significance

  • Distinguish Tumor/Non-Tumor: Analyze CNV patterns to separate malignant cells.
  • Reveal Clonal Heterogeneity: Identify tumor sub-clones and evolutionary relationships.
  • Identify Drivers: Locate regions with frequent CNVs containing oncogenes/suppressors.
  • Integrate Epigenetics: Reveal how CNV affects chromatin openness.

Methods and Selection Guide

ToolCore PrincipleProsConsRecommended Scenario
epiAneuFinderRead Count + LSI. Uses read counts in genomic windows as CNV proxy, LSI preprocessing.- Mature: Verified, stable
- Good Preprocessing: Removes blacklist
- Adjustable Window: Default 100kb
- Intuitive: Clear heatmap
- User Parameters: Window size needs adjustment
- Slow: On large data
- Reference Dependent: Needs sufficient cells
Classic Reliable Choice. Suitable for standardized tumor sample analysis.
CopyscATAuto Control + Multi-level. Infers CNV from reads, auto-identifies non-tumor control.- Auto Control: No manual specification
- Multi-level: Detects local, fragment, arm-level CNV
- Double Minute: Detects high copy amplifications
- Complex Tumor: Handles high heterogeneity
- Complex: High complexity
- Many Parameters: Requires setup
- Data Requirement: High quality needed
Complex Tumor Choice. Best for high heterogeneity or local CNV detection.
AtaCNVHigh Res + Multi-mode Norm. 1 Mbp window detection, 4 norm modes.- High Res: Precise quantification
- Flexible Norm: Adapts to data
- Smoothing: Removes GC bias
- Auto Malignant ID: Via CNV burden
- Joint Segmentation: Improves accuracy
- Mode Selection: Critical
- Resource Intensive: High demand
- Data Quality: High demand
High Resolution Choice. Best for precise quantification or when normal reference exists.

Selection Advice:

  • Standard: epiAneuFinder.
  • Complex/Multi-level: CopyscAT.
  • High Res/Normal Ref: AtaCNV.
  • Exploratory: Start with epiAneuFinder/AtaCNV.
  • Integration: Combine with InferCNV/CopyKAT.

GeneActivity Analysis

Definition

GeneActivity analysis infers potential transcriptional activity by calculating chromatin accessibility in gene bodies and upstream regulatory regions, bridging the gap of "open regions but no expression" in scATAC data.

View More (Expand/Collapse)

Significance

  • Infer Potential Activity: Basis for functional interpretation without RNA data.
  • Assist Annotation: Improve ATAC-based cell type annotation.
  • Identify Differential Accessibility: Find genes with significant activity differences.
  • Integrate Multi-omics: Compare RNA expression and ATAC accessibility to reveal regulation.

Methods

Steps:

  1. Define Region: Gene body + promoter (TSS upstream 2kb).
  2. Count Fragments: Count fragments in regions per cell.
  3. Normalize: Generate GeneActivity matrix.

Tool: Use GeneActivity for calculation, differential analysis, and correlation.

Peak2Gene Analysis

Definition

Peak2Gene analysis identifies significant regulatory relationships between gene expression and nearby chromatin accessibility peaks in multi-omics data. It calculates correlation and corrects for biases (GC, length, distance) to infer which peaks regulate which genes.

View More (Expand/Collapse)

Significance

  • Identify Peak-Gene Relations: Accurate inference beyond linear distance.
  • Construct Cis-Regulatory Networks: Reveal enhancer-gene links.
  • Correct Bias: Reliable associations via GLM.
  • Integrate Multi-omics: Combine with motif analysis for TF→peak→gene axis.

Methods

Steps:

  1. Filter Nearby Peaks: e.g., ±500kb.
  2. Calculate Correlation: Between peak accessibility and gene expression.
  3. Correct Bias: GC, accessibility, length.
  4. Assess Significance: Filter significant links.

Tool: Use Peak2Gene.

Spatial Niche Analysis

Definition

Spatial Niche Analysis identifies, characterizes, and compares spatial domains or cell niches in spatial transcriptomics. These domains consist of cell populations with specific mixing patterns, reflecting functional organization.

View More (Expand/Collapse)

Significance

  • Reveal Organization: Understand functional partitioning.
  • Discover Interactions: Cell mixing reflects interaction.
  • Understand Pathology: Reveal disease-associated spatial remodeling.
  • Guide Therapy: Provide spatial info for precision medicine.

Methods and Selection Guide

ToolCore PrincipleProsConsRecommended Scenario
BanksySpatial Neighborhood Feature Fusion. Fuses cell expression with neighbor features for clustering.- Integrates Spatial: Balances expression and space
- Adjustable: Lambda parameter
- Mature: Based on Leiden/Louvain
- Rich Vis: Spatial/UMAP plots
- Parameter Sensitive: Needs tuning
- Complexity: Slow on large data
Top Recommendation. Best for understanding spatial organization balancing expression and location.
CellCharterMulti-sample Domain ID. Uses scVI/scArches and GMM for clustering, supports multi-sample.- Multi-sample: Handles batch effects
- Scalable: Million cells
- Multi-modal: Supports multi-omics
- Auto Opt: Finds best k
- GPU: Fast
- Complex: Deep learning based
- Setup: Needs config
- Interpretation: Needs bio knowledge
Efficient for multi-sample or massive datasets.

Selection Advice:

  • Standard: Banksy.
  • Multi-sample/Large: CellCharter.

Spatial Cell-Cell Communication Analysis

Definition

Infers L-R signaling integrating spatial location. Unlike traditional analysis, it considers proximity, distance, and diffusion to filter false positives and infer direction/strength.

View More (Expand/Collapse)

Significance

  • Reveal Real Interactions: Filter distant pairs.
  • Infer Direction: Identify sender/receiver (e.g., COMMOT).
  • Understand Microenvironment: Reveal specific patterns in tumor/immune areas.
  • Discover Diffusion: Capture long-range signals.

Methods and Selection Guide

ToolCore PrincipleProsConsRecommended Scenario
COMMOTOptimal Transport. Models ligand distribution to infer direction and downstream response.- Direction: Infers flow
- Diffusion: Models spread
- Downstream: Predicts targets
- Complex: Slow
- Params: Signal type
- Data: Needs coordinates
Directional Signal Study. Best for development/morphogenesis.
CellPhoneDB_spatialSpatially Constrained Permutation. Tests significance within spatial microenvironments.- Authoritative DB: High confidence
- Complexes: Handles subunits
- Strict Stats: Reliable
- Microenv: Compares domains
- No Direction: L-R only
- Dependent: Needs domains
- Human: Mostly human
Strict Validation. Best for high-confidence L-R ID.
CellChat_spatialSpatially Constrained Mass Action. Adjusts probability by distance.- Pathway: System view
- Roles: Sender/Receiver
- Rich Vis: Many plots
- Flexible: Distance factor
- No Direction: At cell type level
- Params: Many
- Slow: Large data
System Pathway Analysis. Best for network comparison.
stLearnSCTP + SME. Integrates space, morphology, expression.- Low False Positive: Dual permutation
- SME: Biological reality
- Specific: Proximity only
- Single Sample: One at a time
- Visium: Optimized for spots
Visium Validation. Best for spot-based data.

Selection Advice:

  • Direction: COMMOT.
  • Strict Validation: CellPhoneDB_spatial.
  • System View: CellChat_spatial.
  • Visium: stLearn.

Spatial Co-localization Analysis

Definition

Quantifies spatial proximity of biological entities (cell types or molecules). Identifies aggregation (positive) or repulsion (negative) to provide spatial evidence for interactions.

View More (Expand/Collapse)

Significance

  • Parse Microenvironment: Reveal functional zones.
  • Identify Interactions: Proximity implies potential interaction.
  • Assess Differences: Compare conditions.

Methods and Selection Guide

ToolCore PrincipleProsConsRecommended Scenario
MISTyMulti-view Modeling. Models feature variation using intra/intercellular views.- Multi-view: Reveals complex regulation
- Cell Type Level: Assesses impact
- Scalable: Multi-modal
- Annotation: Needs high quality
- Slow: Training takes time
- Complex: Interpretation needed
Cell Type/Module Modeling. Best for quantifying spatial impact.
SpaGeneL-R Co-localization. Identifies proximate L-R pairs via statistics.- Focus: L-R pairs
- Strict: Permutation test
- Comparative: Across groups
- Threshold: Sensitive to expression
- Distance: Sensitive to radius
- Needs Info: Spatial evidence only
L-R Co-localization. Best for screening signal pairs.

Selection Advice:

  • Interaction Strength: MISTy.
  • L-R Pairs: SpaGene.

References

  • Tavernari D. et al. MISTy: Multiview intercellular spatial modeling for discovering cell neighborhoods. Nature Communications 14, 1747 (2023).
  • Zhang L. et al. SpaGene identifies spatially proximate ligand–receptor interactions in tissues. Nature Communications 14, 5679 (2023).
  • Armingol E. et al. Deciphering cell–cell interactions and communication from gene expression. Nature Reviews Genetics 22, 71–88 (2021).

Mutation Analysis

Definition

Detects SNVs and Indels from scRNA-seq data. Integrates mutation and expression matrices to locate mutant cells and assess functional changes.

View More (Expand/Collapse)

Significance

  • Identify Mutant Cells: Distinguish wildtype/mutant.
  • Link to Function: Assess impact on expression/pathways.
  • Discover Drivers: Identify enriched mutations.
  • Understand Evolution: Track clonal dynamics.

Tool: Use mut.

lncRNA-mRNA Co-expression Analysis

Definition

Identifies expression correlation between lncRNA and mRNA to infer regulatory functions and functional modules.

View More (Expand/Collapse)

Significance

  • Reveal Function: Infer lncRNA targets.
  • Discover Modules: Identify co-expressed networks.
  • Identify Markers: Cell-type-specific lncRNAs.

Tool: Use LncCoExpression.

Fusion Analysis

Definition

Detects gene fusion events from scRNA-seq (e.g., STAR-Fusion), identifies fusion-carrying cells, and assesses distribution and function.

View More (Expand/Collapse)

Significance

  • Identify Fusion Cells: Distinguish fusion/non-fusion.
  • Link to Function: Assess impact.
  • Discover Drivers: Disease-associated fusions.

Tool: Use Fusion.

Alternative Splicing Analysis

Definition

Compares splicing patterns (e.g., skipped exons) across conditions to identify regulation.

View More (Expand/Collapse)

Significance

  • Qualitative Change: Complements gene expression ("Quantitative Change").
  • Reveal Heterogeneity: Identify isoform switching.

Tool: Use rMATS.

0 comments·0 replies